Integrating Approximate Summarization with Provenance Capture
نویسندگان
چکیده
How to use provenance to explain why a query returns a result or why a result is missing has been studied extensively. Recently, we have demonstrated how to uniformly answer these types of provenance questions for first-order queries with negation and have presented an implementation of this approach in our PUG (Provenance Unification through Graphs) system. However, for realisticallysized databases, the provenance of answers and missing answers can be very large, overwhelming the user with too much information and wasting computational resources. In this paper, we introduce an (approximate) summarization technique that generates compact representations of why and why-not provenance. Our technique uses patterns as a summarized representation of sets of elements from the provenance, i.e., successful or failed derivations. We rank these patterns based on their descriptiveness (we use precision and recall as quality measures for patterns) and return only the top-k summaries. We demonstrate how this summarization technique can be integrated with provenance capture to compute summaries on demand and how sampling techniques can be employed to speed up both the summarization and capture steps. Our preliminary experiments demonstrate that this summarization technique scales to large instances of a real-world dataset.
منابع مشابه
WING-NUS at CL-SciSumm 2017: Learning from Syntactic and Semantic Similarity for Citation Contextualization
We present here system report for our model submitted for shared task on Computational Linguistic Scientific-document Summarization (CL-SciSumm) 2017. We hypothesize that search and retrieval based techniques are sub-optimal for learning complex relation likes provenance. State-of-the-art information retrieval techniques using term frequency inverted document frequency (TF-IDF) to capture surfa...
متن کاملToward Provenance Capturing as Cross-Cutting Concern
Although provenance gained much attention, solutions to capture provenance do not meet all the requirements. For instance, most solution currently assume a closed world and are explicitly designed to capture provenance. Thus, they fail in integrating the provenance concern into existing environments. Hence, we argue that provenance should be considered as cross-cutting concern that can easily b...
متن کاملSGProv: Summarization Mechanism for Multiple Provenance Graphs
Scientific workflow management systems (SWfMS) are powerful tools in the automation of scientific experiments. Several workflow executions are necessary to accomplish one scientific experiment. Data provenance, typically collected by SWfMS during workflow execution, is important to understand, reproduce and analyze scientific experiments. Provenance is about data derivation, thus it is typicall...
متن کاملSyntactic Query Models for Restatement Retrieval
We consider the problem of retrieving sentence level restatements. Formally, we define restatements as sentences that contain all or some subset of information present in a query sentence. Identifying restatements is useful for several applications such as multi-document summarization, document provenance, text reuse and novelty detection. Spurious partial matches and term dependence become imp...
متن کاملA Role for Provenance in Social Computation
We argue that existing systems to support social computation suffer from a lack of transparency and that this can be addressed by integrating provenance capture mechanisms into such systems. We discuss how Semantic Web technologies can be used to facilitate this, and how the provenance record could be used to support various forms of decision-making about tasks such as workforce selection.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017